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Abstract 

We  aim  to  characterize  the  maximum  link  throughput  of  a  multi-channel  opportunistic  communi¬ 
cation  system.  The  states  of  these  channels  evolve  as  independent  and  identically  distributed  Markov 
processes  (the  Gilbert-Elliot  channel  model).  A  user,  with  limited  sensing  and  access  capability,  chooses 
one  channel  to  sense  and  access  in  each  slot  and  collects  a  reward  determined  by  the  state  of  the  chosen 
channel.  Such  a  problem  arises  in  cognitive  radio  networks  for  spectrum  overlay,  opportunistic  trans¬ 
missions  in  fading  environments,  and  resource-constrained  jamming  and  anti-jamming.  The  objective  of 
this  report  is  to  characterize  the  optimal  performance  of  such  systems.  The  problem  can  be  generally 
formulated  as  obtaining  the  maximum  expected  long-term  reward  of  a  partially  observable  Markov 
decision  process  or  a  restless  multi-armed  bandit  process,  for  which  analytical  characterizations  are 
rare.  Exploiting  the  structure  and  optimality  of  the  myopic  channel  selection  policy  established  recently, 
we  obtain  a  closed-form  expression  of  the  maximum  link  throughput  for  two-channel  systems  and  lower 
and  upper  bounds  when  there  are  more  than  two  channels.  These  results  allow  us  to  study  the  rate  at 
which  the  optimal  performance  of  an  opportunistic  system  increases  with  the  number  of  channels  and 
to  obtain  the  limiting  performance  as  the  number  of  channels  approaches  to  infinity. 
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I.  Introduction 

The  fundamental  idea  of  opportunistic  communications  is  to  adapt  the  transmission  parameters 
(data  rate,  modulation,  transmission  power,  etc)  according  to  the  state  of  the  communication  envi¬ 
ronment  including,  for  example,  fading  conditions,  interference  level,  and  buffer  state.  Since  the 
seminal  work  by  Knopp  and  Humblet  in  1995  [1],  the  concept  of  opportunistic  communications 
has  found  applications  beyond  transmission  over  fading  channels.  An  emerging  application  is 
cognitive  radios  for  spectrum  overlay  (also  referred  to  as  opportunistic  spectrum  access),  where 
secondary  users  search  in  the  spectrum  for  idle  channels  temporarily  unused  by  primary  users 
[2].  Another  application  is  resource-constrained  jamming  and  anti-jamming,  where  a  jammer 
seeks  channels  occupied  by  users  or  a  user  tries  to  avoid  jammers. 

We  take  a  simplified  model  of  these  opportunistic  communication  systems  with  N  parallel 
channels.  These  N  channels  are  modeled  as  independent  and  identically  distributed  Gilbert- 
Elliot  channels  [3]  as  illustrated  in  Fig.  1.  The  state  of  a  channel  —  “good”  (1)  or  “bad” 
(0)  —  indicates  the  desirability  of  accessing  this  channel  and  determines  the  resulting  reward. 
With  limited  sensing  and  access  capability,  a  user  chooses  one  of  the  channels  to  sense  and 
access  in  each  slot,  aiming  to  maximize  its  expected  long-term  reward  (i.e.,  ,  throughput).  The 
objective  of  this  report  is  to  characterize  analytically  the  maximum  throughput  of  such  a  system. 
In  particular,  we  are  interested  in  the  relationship  between  the  maximum  throughput  and  the 
number  of  channels. 


Poi 


Fig.  1.  The  Gilbert-Elliot  channel  model. 


P 11 


This  problem  can  be  treated  as  a  partially  observable  Markov  decision  process  (POMDP)  [4] 
or  more  specifically,  a  restless  multi-armed  bandit  process  [5]  due  to  the  independence  across 
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channels.  The  maximum  throughput  of  the  multi-channel  opportunistic  system  is  essentially 
the  maximum  expected  total  reward,  or  the  value  function,  of  a  POMDP  [6].  Unfortunately, 
obtaining  optimal  solutions  to  POMDPs,  even  numerically,  is  often  intractable,  and  closed-form 
expressions  for  value  functions  are  rare. 

In  this  report,  we  obtain  a  closed-form  expression  of  the  maximum  throughput  for  two-channel 
opportunistic  systems.  For  systems  with  more  than  two  channels,  we  develop  lower  and  upper 
bounds  that  monotonically  tighten  as  the  number  N  of  channels  increases.  These  results  allow 
us  to  study  the  rate  at  which  the  optimal  performance  of  an  opportunistic  system  increases  with 
N  and  to  obtain  the  limiting  performance  as  N  approaches  to  infinity.  They  demonstrate  that 
the  optimal  link  throughput  of  a  multi-channel  opportunistic  system  with  limited  sensing  quickly 
saturates  as  the  number  of  channel  increases. 

Our  analysis  hinges  on  the  structure  and  optimality  of  the  myopic  policy  established  in  [7], 
[8].  The  optimality  of  the  myopic  policy  makes  it  sufficient  to  obtain  the  maximum  throughput 
from  the  performance  of  the  myopic  policy,  and  the  simple  structure  of  the  myopic  policy  makes 
it  possible  to  characterize  analytically  its  performance.  Specifically,  based  on  the  structure  of 
the  myopic  policy,  we  show  that  the  performance  of  the  myopic  policy  is  determined  by  the 
steady-state  distributions  of  a  discrete  random  process  with  countable  sample  space.  For  N  =  2, 
this  random  process  is  a  first-order  Markov  chain.  We  obtain  the  stationary  distribution  of  this 
Markov  chain  in  closed-form,  leading  to  exact  characterizations  of  the  maximum  throughput.  For 
N  >  2,  we  construct  first-order  Markov  processes  that  stochastically  dominate  or  are  dominated 
by  the  discrete  random  process.  The  stationary  distributions  of  the  former,  again  obtained  in 
closed-forms,  lead  to  lower  and  upper  bounds  on  the  maximum  throughput. 

II.  Problem  Formulation 

We  consider  the  scenario  where  a  user  is  trying  to  access  the  wireless  spectrum  using  a 
slotted  transmission  structure.  The  spectrum  consists  of  N  independent  and  statistically  identical 
channels.  The  state  S'j(f)  of  channel  i  in  slot  t  is  given  by  a  two-state  discrete-time  Markov 
chain  shown  in  Fig.  1. 

At  the  beginning  of  each  slot,  the  user  selects  one  of  the  N  channels  to  sense.  If  the  channel 
is  sensed  to  be  in  the  “good”  state  (state  1),  the  user  transmits  and  collects  one  unit  of  reward. 
Otherwise  the  user  does  not  transmit  (or  transmits  at  a  lower  rate),  collects  no  reward,  and  waits 
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until  the  next  slot  to  make  another  choice.  The  objective  is  to  maximize  the  average  reward 
(throughput)  over  a  horizon  of  T  slots  by  choosing  judiciously  a  sensing  policy  that  governs 
channel  selection  in  each  slot. 

Due  to  limited  sensing,  the  system  state  [S'i(f),--  -  ,  SN(t)]  G  {0,1}^  in  slot  t  is  not  fully 
observable  to  the  user.  It  can,  however,  infer  the  state  from  its  decision  and  observation  history.  It 
has  been  shown  that  a  sufficient  statistic  of  the  system  for  optimal  decision  making  is  given  by  the 
conditional  probability  that  each  channel  is  in  state  1  given  all  past  decisions  and  observations  [4]. 
Referred  to  as  the  belief  vector,  this  sufficient  statistic  is  denoted  by  Q(t)  =  [ui\ (t),  •  •  •  ,ouN(t)], 
where  c Oi(t)  is  the  conditional  probability  that  Sr ( t )  =  1.  Given  the  sensing  action  a  and  the 
observation  Sa  in  slot  t,  the  belief  vector  for  slot  t  +  1  can  be  obtained  as  follows. 


Ui(t  +  1)  —  < 


Pih  o  i  Sa  1 

Poi )  a  =  i,Sa  =  0 

Vi(t)pn  +  (1  -  0Ji(t))p0i,  a  ±  i 


(1) 


A  sensing  policy  n  specifies  a  sequence  of  functions  i r  =  [zri,  7t2,  -  -  •  ,  7r t\  where  nt  maps  a 
belief  vector  Q(t)  to  a  sensing  action  a(t)  e  {1,  •  •  •  ,  iV}  for  slot  t.  Multi-channel  opportunistic 
access  can  thus  be  formulated  as  the  following  stochastic  control  problem. 

r  t  i 


7 r 


* 


arg  max 

7 r 


£fl(jr,(n(i)))|n(i) 

_t=  1 


5 


where  i?(7rt(0(f)))  is  the  reward  obtained  when  the  belief  is  Hit)  and  channel  7 rt(0(t))  is 
selected,  and  0(1)  is  the  initial  belief  vector.  If  no  information  on  the  initial  system  state  is 
available,  each  entry  of  0(1)  can  be  set  to  the  stationary  distribution  a )Q  of  the  underlying 
Markov  chain: 

_  Pm 

u)Q  •  (2) 

Poi  +  PlO 


III.  Structure  and  Optimality  of  Myopic  Policy 
A.  The  Value  Function 

Let  Vt  (0)  be  the  value  function,  which  represents  the  maximum  expected  total  reward  that  can 
be  obtained  starting  from  slot  t  given  the  current  belief  vector  0.  Given  that  the  user  takes  action 
a  and  observes  Sa,  the  reward  that  can  be  accumulated  starting  from  slot  t  consists  of  two  parts: 
the  immediate  reward  f?a(f2)  =  uia  and  the  maximum  expected  future  reward  Vt^\  (T (Q|a,  sa)), 
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where  T(Q\a,  sa)  denotes  the  updated  belief  vector  for  slot  t  +  1  as  given  in  (1).  Averaging 
over  all  possible  observations  S0  and  maximizing  over  all  actions  a,  we  arrive  at  the  following 
optimality  equation. 

VT(n)  =  max  ua 

y  ’  o=l, •••  ,N 

=  max  (ua+uaVt+1  (T(ft|a,  1)))  +  (1  -  ua)Vt+1  (T(O|a,0)) .  (3) 

a=l,  ”  ,N 

In  theory,  the  optimal  policy  n*  and  its  performance  V\  (00)  can  be  obtained  by  solving  the 
above  dynamic  programming.  Unfortunately,  due  to  the  impact  of  the  current  action  on  the  future 
reward  and  the  uncountable  space  of  the  belief  vector  Q,  obtaining  the  optimal  solution  using 
directly  the  above  recursive  equations  is  computationally  prohibitive.  Even  when  approximate 
numerical  solutions  can  be  obtained,  they  do  not  provide  insight  into  system  design  or  analytical 
characterizations  of  the  optimal  performance  (0(1)). 

B.  The  Myopic  Policy 

A  myopic  policy  ignores  the  impact  of  the  current  action  on  the  future  reward,  focusing  solely 
on  maximizing  the  expected  immediate  reward  R(Q).  Myopic  policies  are  thus  stationary.  The 
myopic  action  a  under  belief  state  =  [uly  ■  ■  ■  ,ujn]  is  simply  given  by 

a(f2)  =  arg  max^wa.  (4) 

In  general,  obtaining  the  myopic  action  in  each  slot  requires  the  recursive  update  of  the 
belief  vector  O  as  given  in  (1),  which  requires  the  knowledge  of  the  transition  probabilities 
{pij}.  Interestingly,  it  has  been  shown  in  [7],  [9]  that  the  myopic  policy  has  a  simple  structure 
that  does  not  need  the  update  of  the  belief  vector  or  the  precise  knowledge  of  the  transition 
probabilities. 

The  basic  structure  of  the  myopic  policy  is  a  round-robin  scheme  based  on  a  circular  ordering 
of  the  channels.  For  pn  >  p0i,  the  circular  order  is  constant  and  determined  by  a  descending 
order  of  the  initial  belief  values.  The  myopic  action  is  to  stay  in  the  same  channel  when  it  is 
good  (state  1)  and  switch  to  the  next  channel  in  the  circular  order  when  it  is  bad.  In  the  case 
of  pn  <  poi,  the  circular  order  is  reversed  in  every  slot  with  the  initial  order  determined  by  the 
initial  belief  values.  The  myopic  policy  stays  in  the  same  channel  when  it  is  bad;  otherwise,  it 
switches  to  the  next  channel  in  the  current  circular  order. 
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Another  way  to  see  the  channel  switching  structure  of  the  myopic  policy  is  through  the  last  visit 
to  each  channel  (once  every  channel  has  been  visited  at  least  once).  Specifically,  for  pn  >  p01, 
when  a  channel  switch  is  needed,  the  policy  selects  the  channel  visited  the  longest  time  ago.  For 
Pn  <  Poo  when  a  channel  switch  is  needed,  the  policy  selects,  among  those  channels  to  which 
the  last  visit  occurred  an  even  number  of  slots  ago,  the  one  most  recently  visited.  If  there  are 
no  such  channels,  the  user  chooses  the  channel  visited  the  longest  time  ago. 

Note  that  the  above  simple  structure  of  the  myopic  policy  reveals  that  other  than  the  order  of 
Pn  and  p01,  the  knowledge  of  the  transition  probabilities  are  unnecessary. 

Surprisingly,  the  myopic  policy  with  such  a  simple  and  robust  structure  achieves  the  optimal 
performance  for  N  =  2  [7],  [9].  It  has  been  conjectured  in  [7],  [9]  (based  on  numerical  examples1) 
that  the  optimality  of  the  myopic  policy  can  be  generalized  to  IV  >  2.  In  a  recent  work  [8],  the 
optimality  of  the  myopic  policy  has  been  established  for  a  general  N  under  the  condition  of 
Pn  >  Poi- 

C.  Simulation  Examples 

1)  Figure  2  below  shows  the  throughput  (average  reward  per  slot)  as  a  function  of  time, 
where  N  =  10,  pu  =  0.1,  p0i  =  0.9.  The  throughput  achieved  by  the  myopic  policy 
increases  with  time,  which  results  from  the  improved  information  on  the  channel  state 
drawn  from  accumulating  observations.  This  demonstrates  that  the  myopic  policy  can 
learn  from  observations  and  track  channels  with  the  good  state  more  effectively  as  the 
observations  accumulate.  Up  to  50%  gain  can  be  achieved  over  random  sensing  whose 
performance  is  static  with  time. 

2)  Another  example  is  shown  in  Figure  3,  where  we  assume  the  channel  transition  probabilities 
change  from  p0i  =  0.1,  pn  =  0.6  to  p01  =  0.4,  pn  =  0.9  at  t  —  6.  Note  that  after  the 
change,  each  channel  is  more  likely  to  be  in  the  good  state.  From  Figure  3,  we  can  see 
that  the  myopic  policy  can  track  this  change  in  the  system  model;  the  throughput  improves 
significantly  after  t  =  5. 


'Among  extensive  examples,  poi  and  pn  are  randomly  chosen  front  interval  [0, 1],  JV  is  chosen  between  3  and  7,  and  T 
is  chosen  between  1  and  20.  We  compare  the  myopic  actions  with  the  optimal  actions  in  each  example,  which  shows  that  the 
myopic  policy  is  optimal. 
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Fig.  2.  myopic  policy  v.s.  random  sensing  policy. 


p11=0.6,  p01=0,1  (T<=5);  p11=0.9,  pQ1=0,4  (T>5) 


Fig.  3.  Tracking  the  change  in  channel  transition  probabilities  occurred  at  t  =  6. 
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IV.  Link  Throughput  Limits 

The  objective  here  is  to  characterize  the  link  throughput  limit  U  of  multi-channel  opportunistic 
access  with  limited  sensing. 


A.  Uniqueness  of  Steady-State  Performance  and  Its  Numerical  Evaluation 

We  first  establish  the  existence  and  uniqueness  of  the  system  steady  states  under  the  myopic 
policy.  The  steady-state  throughput  of  the  myopic  policy  is  given  by 


um))= 


lim 

T— XX) 


VWW)) 

T 


(5) 


where  Vi:t(^(1))  is  the  expected  total  reward  obtained  in  T  slots  under  the  myopic  policy  when 
the  initial  belief  is  0(1). 

The  simple  structure  of  the  myopic  policy  allows  us  to  work  with  a  Markov  reward  process 
with  a  finite  state  space  instead  of  one  with  an  uncountable  state  space  ( i.e .,  belief  vectors)  as 


we  encounter  in  a  general  POMDP.  Details  are  stated  in  the  Theorem  below. 

Theorem  1:  Let  S^\t)  denote  the  state  of  the  i-th  channel  in  the  current  circular  order 
where  the  starting  point  of  the  circular  order  is  fixed  to  the  myopic  action,  i.e.,  a(t)  =  1  for 
all  t.  Then  S(f)  =  S^(t),  •  •  •  ,  S'^(t)]}  forms  a  2N -state  Markov  chain  with  transition 

probabilities  {(p]}  given  in  (6),  and  the  performance  of  the  myopic  policy  is  determined  by  the 
Markov  reward  process  (S (t),R(t))  with  R(t )  =  S^\t). 


Pn  >  Poi 


P n  <  Poi 


f  Uk=iPik,jk  *1  =  1  J  Uk=iPik,jN-k+i  if  *1-1 

\  Ph,jN  rife=2Pujfc-i  if  il  —  0  y  Ph,jiTlk=2Pik,jN-k+2  U  0 

where  i  =  [n ,i2,  ■  ■  ■  ,iN],  J  =  U1J2,  •  •  •  Jn]- 

Proof:  The  proof  follows  directly  from  the  structure  of  the  myopic  policy  by  noticing  that 
S^1\f)  determines  the  channel  ordering  in  S (t  +  1)  and  each  channel  evolves  as  Markov  chains. 
Specifically,  for  pu  >  poi ,  if  *S'(  V)(t)  =  1,  the  channel  ordering  in  S (t  +  1)  is  the  same  as  that 
in  S(f);  if  S^f)  =  0,  the  first  channel  in  S (t)  is  moved  to  the  last  one  in  S (t  +  1)  with  the 
ordering  of  the  rest  N  —  1  channel  intact.  For  pu  <  Poi,  if  S^f)  =  0,  the  first  channel  in  S  (t) 
remains  the  first  in  S (t  +  1)  while  the  ordering  of  the  rest  channels  is  reversed;  if  S(l,(t)  =  1, 
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the  ordering  of  all  N  channels  are  reversed.  The  transition  probabilities  given  in  (6)  thus  follow. 

From  Theorem  1,  [/(f)(1))  is  determined  by  the  Markov  reward  process  (S(f),  R(t)}.  It  is  easy 
to  see  that  the  2Ar-state  Markov  chain  {S(/)}  is  irreducible  and  aperiodic,  thus  has  a  limiting 
distribution.  As  a  consequence,  the  limit  in  (5)  exists,  and  the  steady-state  throughput  U  is 
independent  of  the  initial  belief  value  f)(l). 

Theorem  1  also  provides  a  numerical  approach  to  evaluating  U  by  calculating  the  limiting 
(stationary)  distribution  of  (S(£)}  whose  transition  probabilities  are  given  in  (6).  Specifically, 
the  throughput  U  is  given  by  the  summation  of  the  limiting  probabilities  of  those  2 /V  ~ 1  states 
with  first  entry  S W  =  1.  This  numerical  approach,  however,  does  not  provide  an  analytical 
characterization  of  the  throughput  U  in  terms  of  the  number  N  of  channels  and  the  transition 
probabilities  {pi.j}-  In  the  next  section,  we  obtain  analytical  expressions  of  U  and  its  scaling 
behavior  with  respect  to  N  based  on  a  stochastic  dominance  argument. 

B.  Analytical  Characterization  of  Throughput 

Our  analysis  hinges  on  the  structure  and  optimality  of  the  myopic  policy  given  in  Sec.  III-B. 
The  optimality  of  the  myopic  policy  makes  it  sufficient  to  obtain  U  from  the  performance  of  the 
myopic  policy,  and  the  simple  structure  of  the  myopic  policy  makes  it  possible  to  characterize 
analytically  its  performance. 

1 )  Transmission  Period:  From  the  structure  of  the  myopic  policy  we  can  see  that  the  key  to 
the  throughput  is  how  often  the  user  switches  channels,  or  equivalently,  how  long  it  stays  in 
the  same  channel.  When  pu  >  p01,  the  event  of  channel  switch  is  equivalent  to  a  slot  without 
reward.  The  opposite  holds  when  pn  <  p0 p.  a  channel  switch  corresponds  to  a  slot  with  reward. 

channel  switch 


Fig.  4.  The  transmission  period  structure. 
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We  thus  introduce  the  concept  of  transmission  period,  which  is  the  time  the  user  stays  in  the 
same  channel,  as  illustrated  in  Fig.  4.  Let  Lk  denote  the  length  of  the  A  th  transmission  period. 
We  thus  have  a  discrete-time  random  process  {Lk}<jf=1  with  a  sample  space  of  positive  integers. 
It  is  easy  to  show  that  throughput  U  is  determined  by  the  average  length  L  of  a  transmission 
period  as  given  in  Lemma  1  below. 

—  r 

Lemma  1:  Let  L  =  lim^oo  denote  the  average  length  of  a  transmission  period.  The 

throughput  limit  U  is  given  by 


(  1-1  /L,  pu  >  poi 

u  =  <  •  (7) 

[  1/ L,  pu  <  poi 

Proof:  When  pn  >  Po  i ,  the  user  collects  (Lk  —  1)  units  of  reward  during  each  transmission 
period  Lk,  obtain  U  as  the  average  reward  over  an  infinite  number  of  transmission  periods.  We 
have 


U  =  lim 

K—*oo 


^U{Lk-  1) 
sf=i  Lk 


=  1  - 


lim 


■K—*  oo 


sf=i  Lk 
I< 


=  1-Z’ 


(8) 


where  L  denotes  the  average  length  of  a  transmission  period. 

When  pu  <  pm,  the  user  collects  1  unit  of  reward  during  each  transmission  period. 


yl\  i 

U  =  lim 

A-^oo  Ef=1Lfc 


lim 


sy,Lfc  L 

K — >-oo 


(9) 


C.  Link  Throughput  Limit  for  N  =  2 

For  N  =  2,  {Lfc}^=1  is  a  first-order  Markov  chain.  We  have  the  following  lemma. 

Lemma  2:  is  an  irreducible,  recurrent,  and  aperiodic  first-order  Markov  chain  with 

the  following  unique  stationary  distribution  (the  limiting  distribution)  {A;}^L1. 

•  Case  1:  pu  >  Poi 


1  —  u,  1  =  1 

uplu2Pio,  l  >  2 


(10) 
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where  u  is  the  expected  probability  that  the  channel  we  switch  to  is  in  state  1,  i.e.,  ,  the  expected 


belief  value  of  the  channel  we  switch  to.  It  is  given  by 


lx!  = 


(2) 

Poi 


1  +  pfl  -  A 


where  p$  =  p00Poi  +  P01P11,  A  =  (1  - 


(Pll-P0l)3(l-Pll)' 
-(Pll)2+PHP01  ' 


(11) 


•  Case  2:  pu  <  Pm 


A  /  = 


lx)' 


l  =  1 


(1  -^')Poo2Poi,  l>  2 


(12) 


where  u'  is  the  expected  probability  that  the  channel  we  switch  to  is  in  state  1.  It  is  given  by 


B 


lx)  = 


i  -p{3  +  b' 


(13) 


where  pf}  =  pwp0 1  +  PnPn,  B  =  1+^_pn  (1  + 


(Pll-P0l)3(l-Pll) 


Proof:  Since  is  an  irreducible,  recurrent,  and  aperiodic  first-order  Markov  Chain, 

if  there  exists  a  stationary  distribution  A  =  [Ai, A;, ...],  then  A  is  the  limiting  distribution. 
Case  1:  pn  >  poi 

The  transition  matrix  Q  =  { qt) }  of  the  Markov  chain  { Ap }  f= ,  is 


1 

Qn  =  1-Poi  , 


i  >  1 


(i+1)  j  —  2  •  v  -1  •  \  r> 

Qij  =Poi  Pn  P io,  *  >  1,3  >  2. 


Let  Q(:,k )  denote  the  k\h  column  of  Q.  We  have 


(14) 


1-Q(:,1)  = 


Q(-,  2) 
Pio 


(15) 


where  1  is  the  unit  column  vector  [1, 1,  ...]T.  Based  on  the  definition  of  stationary  distribution, 
we  have 


Ag(:,l)  =  A1  (16) 

A  Q('-i  2)  =  A2 


(17) 


TECHNICAL  REPORT  TR-07-04,  UC  DAVIS ,  JULY  2007. 


12 


Combine  (15)-(17),  we  have: 


Ai  = 


1  - 


^2 

(1  —  Pll) 


For  k  >  2,  we  have  Q(:,  k )  =  Q(:,  2)(pu)k  2.  Together  with  the  following  equations 


(18) 


XQ(:,k)  =  Xk,  (19) 

Tq(:52)-A23  (20) 

we  obtain 

Xk  =  X2pku2  (21) 

Substituting  (19)  and  (21)  into  (20),  we  have  [1  -  A2,  A2pn,  X2p\l,  •  •  -]Q(:,  2)  =  A2. 

Solving  for  A2,  we  have  A2  =  cupio,  which  gives  us  the  stationary  distribution  as 


Afc 


l  —  o k  —  1 
^Pn_2Pio,  k  >  1, 


(22) 


where  u; 


(2) 

Poi 


i+Pm-4’ 


and  A 


P01 

1+P01— Pll 


(1 


(pll~p0l)3(l-pil) 

l-(pn)2+pnPoi 


Case  2:  pn  <  p0i 

The  transition  matrix  Q  =  { qj} }  of  the  Markov  chain  {Lk}™=1  is 


(fc+1)  •  \  -i 

(hi  =P ii  ,  *  >  1 

Qij  =  Pio^ (Poo)j~2Poi,  i  >  1,  J  >  2 


(23) 


Similar  to  Case  1,  we  can  obtain  the  stationary  distribution  A  of  Q  as 


u/,  k  =  1 

(1  -^')Poo“2Poi,  k  >  1, 


(24) 


where  o/  =  — — .  and  £> 

l-Pn+S’ 


P01  /  ,  (Pll-P0l)3(l-Pll)  \ 

1+POl-pil''  1  — (1— poi)(pn— poi) ' 
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From  Lemma  2  and  Lemma  1,  we  obtain  the  throughput  limit  U  for  N 
theorem  below. 

Theorem  2:  For  N  =  2,  the  throughput  limit  U  is  given  by 

i-irgb  pii>poi 

l-ff+poi’  Pn  <  P01 

where  d>  and  O'  are  given,  respectively,  in  (22)  and  (24). 

D.  Link  Throughput  Limit  for  N  >  2 

For  iV  >  2,  it  is  difficult  to  obtain  the  average  length  L  of  a  transmission  period.  Our  objective 
is  to  develop  lower  and  upper  bounds  on  the  throughput  limit  U. 

The  approach  is  to  construct  first-order  Markov  chains  that  stochastically  dominate  or  are 
dominated  by  {Lk}™=1.  The  limiting  distributions  of  these  first-order  Markov  chains,  which 
can  be  obtained  in  closed-form,  thus  lead  to  lower  and  upper  bounds  on  U  according  to 
Lemma  1.  Specifically,  for  pu  >  p0i>  a  lower  bound  on  U  is  obtained  by  constructing  a  first- 
order  Markov  chain  whose  limiting  distribution  is  stochastically  dominated  by  the  stationary 
distribution  of  An  upper  bound  on  U  is  given  by  a  first-order  Markov  chain  whose 

stationary  distribution  stochastically  dominates  the  stationary  distribution  of  Similarly, 

bounds  on  U  for  pu  <  p0i  can  be  obtained. 

Theorem  3:  For  iV  >  2,  we  have  the  following  lower  and  upper  bounds  on  the  throughput 
limit  U. 

•  Case  1:  pn  >  p0i 


=  2  as  given  in  the 


(25) 


C  +  (  1 

where  ui0  is  given  by  (2),  C  = 


-D  +  C){  1-pu)  ~U~ 

ui0(l  -  (pu  ~Poi)N),  D  = 


1  -pu  +cu0’ 

/i  _  (pii-poi)JV+1(l-pn)  \ 
l-(pii)2+pnpoi 


(26) 


•  Case  2:  pu  <  p0i 


(2) 

PlO 


E  —  PoiH 


<  U  <  1 


(2) 

PlO 


E  —  PoiG 


(27) 


TECHNICAL  REPORT  TR-07-04,  UC  DAVIS,  JULY  2007. 


14 


where  =  P10P00  +  P11P10, 

E  =  p[20\l+p0i)  +pol(l-F), 

f=(  i  -  pdixi  -  -  1_(pp::^~w)4 

G  =  (1  —  w0)(^ 

H  =  (1  - 


poi  J— (pii- Poi)2(l— POl)2 
POl(Pll-POl)6 


'  2— Pol  1-(P11-P0l)2(l-P0l)2  '  ’ 

POlCPll-POl)2^"1  \ 

“  2— POI  1  — (pil— P0l)2(l— poi)2  ' 


•  Monotonicity:  for  both  cases,  the  difference  between  the  upper  and  lower  bounds  monoton- 
ically  decreases  with  N;  for  pu  >  p01,  the  lower  bound  converges  to  the  upper  bound  as 

N  — >  oo. 


Proof: 

Case  1:  pn  >  p0i 

Let  ay  denote  the  belief  value  of  the  chosen  channel  in  the  first  slot  of  the  A  -th  TP.  The  length 
Lfc(oy)  °f  this  TP  has  the  following  distribution. 

{1  —  ay,  /  —  1 

’  ■  (28) 

UkPil  Pio,  (  >  1 

It  is  easy  to  see  that  if  a/  >  u,  then  Lk[u')  stochastically  dominates  Lk{u). 


Fig.  5.  The  j-step  transition  probabilities  of  the  Gilbert-Elliot  channel. 


From  the  round-robin  structure  of  the  myopic  policy,  ay  =  Poi\  where  Jk  =  , 1  Lk_t  +  1. 

Based  on  the  monotonic  increasing  property  of  the  j-step  transition  probability  pf{  (see  Fig.  5), 
we  have  ay  <  ay,  where  ay  is  the  stationary  distribution  of  the  Gilbert-Elliot  channel  given  in 
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(2).  Lk(oj0 )  thus  stochastically  dominates  Lk(ouk),  and  the  expectation  of  the  former,  Lk( ua)  = 
1  +  1^ii ,  leads  to  the  upper  bound  of  U  given  in  (26). 

Next,  we  prove  the  lower  bound  of  U  by  constructing  a  hypothetical  system  where  the  initial 
belief  value  of  the  chosen  channel  in  a  TP  is  a  lower  bound  of  that  in  the  real  system.  The  average 
TP  length  in  this  hypothetical  system  is  thus  smaller  than  that  in  the  real  system,  leading  to 
a  lower  bound  on  U  based  on  (7).  Specifically,  since  uk  =  Pqi  ^  and  Jk  =  +  1  > 

N  +  Lk_i  —  1,  we  have  uik  <  p^+Lk~1~1'> .  We  thus  construct  a  hypothetical  system  given  by  a 
first-order  Markov  chain  {L'k}^=l  with  the  following  transition  probability  rr]. 


ri,j  = 


1  _ 

1  P  01  5 


(29) 


*  >  1,  3  =  1 
pS+l~1]  (pu)j~2pw,  i  >  1,  j  >  2 

Lemma  3:  The  stationary  distribution  of  the  first  order  Markov  chain  {L'k}^L1  is  stochastically 
dominated  by  the  stationary  distribution  of  {Lk}^=1. 

Proof: 

Let  u'k  denote  the  expected  probability  that  the  chosen  channel  is  in  state  1  in  the  first  slot  of 
the  k-\h  transmission  period  of  { L’k  }f=].  Assume  in  the  A  -th  transmission  period,  the  distributions 
of  L'k  and  Lk  both  equal  to  the  same  distribution  A ,  which  may  or  may  not  be  the  stationary 
distribution  of  (Lfc}^=1  .  Next  we  show  u>k+n  >  oj'k+n  for  any  n  >  1  by  induction. 

When  7i—l,  we  have 


>  E£1E1,_„+a,...,1>_1|p0Kr1+1*|Lt  =  l}Pr(Lk  =  i) 

=  es,pS!-1+'a, 

=  uk+ 1- 

Assume  u:k+n  >  u>k+n,  then 


(30) 


1  _A_yk+n  t 

■-*-"«  ■\Lt+n  =  l]Pr(Lk+n  =  l) 
>  E£1EIltn_N+2,...,I„+n_1(pJ'1-1+1‘+"|Ll+n  =  l]Pr(Lk+n  =  l) 

=  ESiPoT1+'7>>-(^+»  =  0 


(31) 
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Since  a ;fe+n  >  c o'k+n,  by  (28),  we  have 


Pr(Lk+n  =  l)<Pr(L'k+n  =  l),  if  1  =  1; 

Pr(Lk+n  =  l)  >Pr(L'k+n  =  l),  if  l  >  1.  (32) 

Since  the  largest  number  in  the  series  WVV  is  the  first  one,  by  (32)  and  the  fact  that 

E|“1I3r(Lt+„  =  0  =  =  l)  =  1,  we  have 

p"r1+'Pr(Li+„  =  ()  >  E£1pJ'r1+,7>r(Li+„  =  i)  =  4+„+,  (33) 

Combine  (31)  and  (33),  we  have  ccfc+r)+i  >  u>'k+n+1. 

By  the  above  induction,  we  have  u)k+n  >  c o'k+n  for  any  n  >  1.  So  the  stationary  distribution 
of  the  first  order  Markov  chain  {L,k}kf1  is  dominated  by  the  stationary  distribution  of  (Lfc}^=1. 


The  first  order  Markov  chain  { L'k } Jf= ,  has  the  following  transition  matrix  S  =  {sij}°°=1 

U  =  i>i  (34) 

\  Sij  =Vm+l  ^(PnY  Vo,  *  >  1,2  >  2 

Let  L'  denote  the  average  length  of  a  transmission  period  of  L'k.  Solving  for  the  stationary 
distribution  of  from  S,  we  obtain  L\  which  leads  to  a  lower  bound  on  U  according  to 

Lemma  3  and  Lemma  1. 


Case  2:  pn  <  p0 1 

In  this  case,  the  larger  the  initial  belief  of  the  chosen  channel  in  a  given  TP,  the  smaller  the 
average  length  of  the  TP.  On  the  other  hand,  (7)  shows  that  U  is  inversely  proportional  to  the 
average  TP  length.  Thus,  similar  to  the  case  of  pu  >  p01,  we  will  construct  hypothetical  systems 
where  the  initial  belief  of  the  chosen  channel  in  a  TP  is  an  upper  bound  or  a  lower  bound  of 
that  in  the  real  system.  The  former  leads  to  an  upper  bound  on  U,  the  latter,  a  lower  bound  on 
U. 

Consider  first  the  upper  bound.  From  the  structure  of  the  myopic  policy,  it  is  clear  that  when 
Lk- 1  is  odd,  in  the  A-th  TP,  the  user  will  switch  to  the  channel  visited  in  the  (k  —  2)-th  TP.  As 
a  consequence,  the  initial  belief  u>k  of  the  k- th  TP  is  given  by  u>k  =  p[±k~1+1\  When  Lfc_i  is 
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even,  we  can  show  that  uk  <  p[ik  1+1).  This  is  because  that  for  N  >  3  and  Lk_i  even,  the  user 

cannot  switch  to  a  channel  visited  Lk_ i  +  2  slots  ago,  and  pf}  decreases  with  j  for  even  j’s  and 

(i) 

p\ {  >  p\  /  for  any  even  j  and  odd  i  (see  Fig.  5).  We  thus  construct  a  hypothetical  system  given 
by  the  first-order  Markov  chain  with  the  following  transition  probabilities. 

p\\'  if  i  is  odd,  j  =  1 

p!o+1)(Poo)j_2Poi,  if  i  is  odd,  j  >2 

(i+ 4) 

pu  ,  if  7  is  even,  j  =  1 

Pio+4)(Poo)j_2.Poi,  if  *  is  even,  j  >  2 

Similar  to  the  proof  of  Lemma  3,  it  can  be  shown  that  the  stationary  distribution  of  {L'k}^=l  is 
stochastically  dominated  by  that  of  {Lk\^=1.  The  former  leads  to  the  upper  bound  of  U  given 
in  (27). 

We  now  consider  the  lower  bound.  Similarly,  u>k  =  1  11  when  Lk -  \  is  odd.  When  Lk-\ 

is  even,  to  find  a  lower  bound  on  uk,  we  need  to  find  the  smallest  odd  j  such  that  the  last  visit 
to  the  channel  chosen  in  the  k- th  TP  is  j  slots  ago.  From  the  structure  of  the  myopic  policy,  the 


smallest  feasible  odd  j  is  Lk_i+2N  —  3,  which  corresponds  to  the  scenario  where  all  N  channels 
are  visited  in  turn  from  the  (k  —  N  +  l)-th  TP  to  the  £;-th  TP  with  Lk_ n+i  =  Lk_ jy+2  =  •  ■  •  = 
Lk_ 2  =  2.  We  thus  have  ujk  >  p[±k~1+2N  3\  We  then  construct  a  hypothetical  system  given  by 
the  first-order  Markov  chain  {L'k}^L1  with  the  following  transition  probabilities. 

Piil\  if  iis  odd,  j  =  1 

Pio1]  (Poo)j~2Pou  if  i  is  odd,  j  >2 

(i+2N—3)  ....  .  ’ 

p\ ,  ,  if  %  is  even,  j  —  1 

Pio2N-3)(poo)j~2Poi,  if  i  is  even,  j  >2 
The  stationary  distribution  of  this  hypothetical  system  leads  to  the  lower  bound  of  U  given 
in  (27). 


Monotonicity 

The  monotonicity  statements  can  be  shown  by  noticing  that  for  both  cases,  the  lower  bound 
increases  with  N,  while  the  upper  bound  is  not  a  function  of  N. 

UJJ 

Coroallary  1:  For  pu  >  pm,  the  lower  bound  on  throughput  U  converges  to  the  constant 
upper  bound  at  geometrical  rate  (pu  —  Poi)  as  N  increases;  for  pn  <  pm,  the  lower  bound  on 
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U  converges  to  a  constant  at  geometrical  rate  (p0i  ~  Pn)2- 
Proof: 

Let  x  =  \pn  —  Poi|-  For  pu  >  pm,  after  some  simplifications,  the  lower  bound  has  the  form 
a  +  b/(xN  +  c),  where  a,b,c  (c  7^  0)  are  constants.  The  upper  bound  is  a  +  b/c.  We  have 
\a+b^x  +0-a-h/d  — >  b/c2  as  N  — >  00.  Thus  the  lower  bound  converges  to  the  upper  bound  with 


geometric  rate  x. 

For  pn  <  p01,  the  lower  bound  has  the  form  d  +  e/(x2N  1  +  /),  where  d,  e,  /  (/  f  0)  are 
constants.  It  converges  to  d  +  e/f  as  N  —>00.  We  have  \d+edx — — >  e/(xf2)  as 
N  — >  00.  Thus  the  lower  bound  converges  with  geometric  rate  x2.  Mil 

Though  it  is  difficult  to  get  a  closed-form  throughput  limit  for  N  >  2,  we  can  calculate  the 
throughput  limit  numerically  by  Theorem  1.  We  show  an  example  of  the  throughput  limit  for 
N  >  2  and  pu  >  p0i  as  follows. 
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Fig.  6.  The  throughput  limit  for  N  >  2  and  pn  >  poi  ■ 
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E.  Link  Throughput  over  A  Finite  Horizon 

It  is  interesting  to  note  that  we  can  obtain  a  closed-form  expression  for  the  throughput  during 
a  finite  period  T  under  certain  conditions. 

Theorem  4:  When  pn  >  pm  and  N  >  T,  the  maximum  expected  total  reward  over  T  slots 
when  the  initial  belief  0(1)  is  given  by  the  stationary  distribution  u0  is  a  function  of  T  and  u0: 


V (u0,  T) 


u0(T  2)  uj0(uj0 

1  —  Pn  +  u0 


Pn)3(l  ~  (gn  -u0)T  2) 

(1  -pu  +C 0o)2 


+  CU0  +  PllUJo  +  (1 


Uo)(Vo  (37) 


Proof:  From  the  structure  of  the  myopic  policy,  if  the  user  observes  state  1  from  a  channel, 
it  will  stay  on  that  channel.  Otherwise,  it  will  switch  to  a  new  channel.  Clearly,  V  does  not 
depend  on  N  since  at  most  T  channels  need  to  be  considered  during  T  slots. 

In  the  first  slot,  the  user  randomly  chooses  one  channel  and  gets  uj0  unit  of  reward.  Then  the 
user  will  either  stay  or  switch.  This  process  is  a  Markov  chain  with  states  “stay”  and  “switch” 
as  shown  below. 


1  -pu 


Fig.  7.  The  Markov  chain  with  states  “stay”  and  “switch”. 


If  the  user  observes  1  after  the  fist  slot,  it  will  stay  and  get  pn  unit  of  reward.  Otherwise  it 
will  switch  to  a  new  channel  and  get  uj0  unit  of  reward.  So  V  is  determined  by  the  distribution 
of  the  states  of  the  above  two-state  Markov  chain. 
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V  (uja,  T)  =  E^=\  [0Jo  1  -  UJQ 


^-IM=1  \~^o  1  k’o] 

^M=l  [^o  1  ^o]  { 

1-Pll  Pll  —  1 
CJ0 


n  M-l  r 


Pll  1-Pll 

CU0  1  -  UJo 

Pn  1-pn 

C<20  1  UJo 

1 

1  —  Pll  +  <^o 


Pll 

02o 


M 


V 11 

(jjn 


UJo  1  -  Pll 
UJ0  l-  Pll 


+  0)o 

+  02o  +  Pii02o  +  (1  —  UJ0)uJ0 
(pn  -  uj0)m 


+ 


1  —  Pll  + 


p  11 
L0o 


+  o’,,  +  pnu;0  +  (1  —  uj0)uJo 


uj0{T  -  2)  cu0(u;0 -pn)3(l  -  (pn  -  cu0)T  L) 


1  -  Pn  +  up 


+ 


(1  -  pn  +  UJ0)2 


+  a;0  +  pnup,  +  (1  —  ujo)uj0 


(38) 


From  the  above,  we  immediately  see  the  link  throughput  limit  U  as  N  — >  oo  is  given  as 
follows: 


U  =  lim 

T— xx) 


V(u0,T) 


(jJ  rt 


T  1  —  pn  +  uj0  ’ 

which  agrees  with  the  upper  bound  given  in  Theorem  3. 


(39) 


F.  Numerical  Examples 

In  this  section,  we  demonstrate  the  tightness  of  the  bounds  on  U  given  in  Sec.  IV-D  by 
examining  the  relative  difference  d(N)  between  the  upper  and  the  lower  bound,  where  d(N)  is 
defined  as  the  difference  of  the  lower  and  upper  bound  divided  by  the  upper  bound.  In  Fig.  8,  we 
plot  d(N  =  5)  with  respect  to  the  upper  bound  for  pn  >  p0 1.  From  Fig.  8  we  observe  that  for 
most  values  of  pn  and  p0i,  d(N  =  5)  is  below  6%,  demonstrating  the  tightness  of  the  bounds 
even  for  a  small  number  of  channels.  Furthermore,  Fig.  8  shows  that  the  bounds  are  tighter  for 
larger  p01.  Similarly  observations  can  be  drawn  from  Fig.  9  for  the  case  of  pn  <  pm. 
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Fig.  8.  The  relative  difference  d(N  =  5)  between  the  upper  and  the  lower  bound  for  pn  >  poi- 


Fig.  9.  The  relative  difference  d(N  =  5)  between  the  upper  and  the  lower  bound  for  pn  <  poi- 
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In  Fig.  10  and  11  we  examine  the  rate  at  which  the  lower  bound  approaches  to  the  upper 
bound  as  N  increases.  Specifically,  we  plot  the  ratio  of  d(N  =  10)  to  d(N  =  3).  We  observe 
that  in  both  cases,  the  lower  bound  approaches  to  the  upper  bound  quickly.  While  demonstrating 
the  usefulness  of  the  bounds  for  small  N,  this  observation  conveys  a  pessimistic  message:  the 
optimal  link  throughput  of  a  multi-channel  opportunistic  system  with  limited  sensing  quickly 
saturates  as  N  increases. 


Fig.  10.  The  rate  at  which  the  lower  bound  approaches  to  the  upper  bound  as  N  increases  (pu  >  pot)- 


Fig.  11.  The  rate  at  which  the  lower  bound  approaches  to  the  upper  bound  as  N  increases  (pu  <  pot)- 
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V.  Conclusion  and  Future  Work 

In  this  report,  we  have  analyzed  the  optimal  link  throughput  of  multi-channel  opportunistic 
communication  systems  under  an  i.i.d.  Gilbert-Elliot  channel  model.  The  obtained  analytical 
results  allow  us  to  systematically  examine  the  impact  of  the  number  of  channels  and  channel 
dynamics  (transition  probabilities)  on  the  system  performance.  Future  work  includes  the  gen¬ 
eralization  to  cases  with  sensing  errors  and  non-identical  channels.  The  former  can  again  be 
addressed  by  exploiting  the  structure  and  optimality  of  the  myopic  policy  in  the  presence  of 
sensing  errors  as  established  in  [10]. 
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